Crawlets: Agents for High Performance Web Search Engines
نویسندگان
چکیده
Some of the reasons for unsatisfactory performance of today’s search engines are their centralized approach to web crawling and lack of explicit support from web servers. We propose a modification to conventional crawling in which a search engine uploads simple agents, called crawlets, to web sites. A crawlet crawls pages at a site locally and sends a compact summary back to the search engine. This not only reduces bandwidth requirements and network latencies, but also parallelizes crawling. Crawlets also provide an effective means for achieving the performance gains of personalized web servers, and can make up for the lack of cooperation from conventional web servers. The specialized nature of crawlets allows simple solutions to security and resource control problems, and reduces software requirements at participating web sites. In fact, we propose an implementation that requires no changes to web servers, but only the installation of a few (active) web pages at host sites.
منابع مشابه
A New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملمدل جدیدی برای جستجوی عبارت بر اساس کمینه جابهجایی وزندار
Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...
متن کاملComparison of Three Vertical Search Spiders
T he Web has plenty of useful resources, but its dynamic, unstructured nature makes them difficult to locate. Search engines help, but the number of Web pages now exceeds two billion, making it difficult for generalpurpose engines to maintain comprehensive, up-todate search indexes. Moreover, as the Web grows ever larger, so does information overload in query results. A general-purpose search e...
متن کاملIntelligent Data Mining and Information Retrieval from World Wide Web for E-Business Applications
There is a large increase in the amount of information available on World Wide Web and also in number of online databases. This information abundance increases the complexity of locating relevant information. Such a complexity drives the need for improved and intelligent search and retrieval engines. Current search engines are primarily passive instruments. Intelligent Agents can improve the pe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001